Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora
نویسندگان
چکیده
In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by up
منابع مشابه
A Corpus-based Method for Extracting Paraphrases of Emotion Terms
Since paraphrasing is one of the crucial tasks in natural language understanding and generation, this paper introduces a novel technique to extract paraphrases for emotion terms, from non-parallel corpora. We present a bootstrapping technique for identifying paraphrases, starting with a small number of seeds. WordNet Affect emotion words are used as seeds. The bootstrapping approach learns extr...
متن کاملLearning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its a...
متن کاملApplicability Analysis of Corpus-derived Paraphrases toward Example-based Paraphrasing
Two kinds of paraphrases extracted from a bilingual parallel corpus were analyzed. One is from an adjectival predicate sentence to a non-adjectival one. The other is from a passive form to a non-passive form. The ability to extract paraphrases is strongly desired for paraphrasing studies. Although extracting paraphrases from multi-lingual parallel corpora is possible, the type of paraphrases ex...
متن کاملExtracting Paraphrases from Definition Sentences on the Web
We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of ...
متن کاملExtracting Lay Paraphrases of Specialized Expressions from Monolingual Comparable Medical Corpora
Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We ide...
متن کامل